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Abstract. We present a Projection onto Convex Sets (POCS) type algorithm for 
solving systems of linear equations. POCS methods have found many applications 
ranging from computer tomography to digital signal and image processing. The 
Kaczmarz method is one of the most popular solvers for overdetermined systems 
of linear equations due to its speed and simplicity. Here we introduce and analyze 
an extension of the Kaczmarz method which itcrativcly projects the estimate onto 
a solution space given from two randomly selected rows. We show that this pro- 
jection algorithm provides exponential convergence to the solution in expectation. 
The convergence rate significantly improves upon that of the standard random- 
ized Kaczmarz method when the system has coherent rows. We also show that 
the method is robust to noise, and converges exponentially in expectation to the 
noise floor. Experimental results are provided which confirm that in the coherent 
case our method significantly outperforms the randomized Kaczmarz method. 



1. Introduction 

We consider a consistent system of linear equations of the form 

Ax = b, 

where b G C m and A G C mxn is a full-rank m x n matrix that is overdetermined, 
having more rows than columns (m > n). When the number of rows of A is large, 
it is far too costly to invert the matrix to solve for x, so one may utilize an iterative 
solver such as the Projection onto Convex Sets (POCS) method, used in many 
applications of signal and image processing [TJ [IS]. The Kaczmarz method is often 
preferred, iteratively cycling through the rows of A and orthogonally projecting the 
estimate onto the solution space given by each row [10]. Precisely, let us denote 
by cti, a 2 , . . ., a m the rows of A and b\, 62, • • ■> b m the coordinates of b. For 
simplicity, we will assume throughout that the matrix A is standardized, meaning 
that each of its rows has unit Euclidean norm; generalizations from this case will 
be straightforward. Given some trivial initial estimate x , the Kaczmarz method 
cycles through the rows of A and in the kth iteration projects the previous estimate 
Xk onto the solution hyperplane of (a^, x) = hi where i = k mod m, 

x k +i = x k + - (a<, x k ))a,i. 
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Theoretical results about the rate of convergence of the Kaczmarz method have 
been difficult to obtain, and most are based on quantities which are themselves hard 
to compute [3j [7]. Even more importantly, the method as we have just described 
depends heavily on the ordering of the rows of A. A malicious or unlucky ordering 
may therefore lead to extremely slow convergence. To overcome this, one can select 
the rows of A in a random fashion rather than cyclically [HI [12] . Strohmer and Ver- 
shynin analyzed a randomized version of the Kaczmarz method that in each iteration 
selects a row of A with probability proportional to its Euclidean norm [2"U| [T5] . Thus 
in the standardized case we consider here, a row of A is chosen uniformly at random. 
This randomized Kaczmarz method is described by the following pseudocode. 

Algorithm 1.1: Randomized Kaczmarz 
Input: Standardized matrix A, vector b 

Output: An estimation Xk of the unique solution x to Ax = b 

Set Xo. { Trivial initial approximation } 

k <r- 

repeat 

k «- k + 1 

Select r G {1, 2, . . . , n} { Randomly select a row of A } 

Set Xf, a;fc_! + [b r — (a r ,Xk_ 1 ))a r { Perform projection } 

Note that this method as stated selects each row with replacement, see [17] for 
a discussion on the differences in performance when selecting with and without 
replacement. Strohmer and Vershynin show that this method exhibits exponential 
convergence in expectation [201 IT9j . 



(1.1) E||aj fc - x\\\ < ( 1 - — ] ||aj -aj||2) where R = \\A\\ 2 F \\A 1 



Here and throughout, || • || 2 denotes the vector Euclidean norm, || ■ || denotes the 
matrix spectral norm, || ■ ||f denotes the matrix Frobenius norm, and the inverse 
||A _1 || = inf{M : M||Acc||2 > H^lh for all x} is well-defined since A is full-rank. 
This bound shows that when A is well conditioned, the randomized Kaczmarz 
method will converge exponentially to the solution in just O(n) iterations (see 
Section 2.1 of [20] for details). The cost of each iteration is the cost of a single 
projection and takes O(n) time, so the total runtime is just 0(n 2 ). This is superior 
to Gaussian elimination which takes 0(mn 2 ) time, especially for very large systems. 
The randomized Kaczmarz method even substantially outperforms the well-known 
conjugate gradient method in many cases [2D] . 

Leventhal and Lewis show that for certain probability distributions, the expected 
rate of convergence can be bounded in terms of other natural linear-algebraic quanti- 
ties. They propose generalizations to other convex systems [TT] . Recently, Chen and 
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Powell proved that for certain classes of random matrices A, the randomized Kacz- 
marz method convergences exponentially to the solution not only in expectation but 
also almost surely [T6j . 

In the presence of noise, one considers the possibly inconsistent system Ax + w ~ b 
for some error vector w. In this case the randomized Kaczmarz method converges 
exponentially fast to the solution within an error threshold |13j . 

/ i \ */2 

(1.2) E||£E fc -x\\ 2 < ( 1 - -= J ||aj - sc|| 3 + v^RlHIoc, 

where R the the scaled condition number as in (II. ip and || ■ ||oo denotes the largest 
entry in magnitude of its argument. This error is sharp in general [13]. Modified 
Kaczmarz algorithms can also be used to solve the least squares version of this 
problem, see for example [H El El E] and the references therein. 

1.1. Coherent systems. Although the convergence results for the randomized 
Kaczmarz method hold for any consistent system, the factor -1 in the convergence 
rate may be quite small for matrices with many correlated rows. Consider for exam- 
ple the reconstruction of a bandlimited function from nonuniformly spaced samples, 
as often arises in geophysics as it can be physically challenging to take uniform sam- 
ples. Expressed as a system of linear equations, the sampling points form the rows 
of a matrix A; for points that are close together, the corresponding rows will be 
highly correlated. 

To be precise, we examine the coherence of a standardized matrix A by defining the 
quantities 

(1.3) A = A (A) = max | (a,j, a^) | and 5 = 5(A) = min | (a,j, a*.) |. 

j=ik j=ik 

Note that because A is standardized, < 8 < A < 1. It is clear that when A has 
high coherence parameters, 1 1 ^4. 1 1 1 is very small and thus the factor R in (II .ip is 
also small, leading to a weak bound on the convergence. Indeed, when the matrix 
has highly correlated rows, the angles between successive orthogonal projections are 
small and convergence is stunted. We can explore a wider range of orthogonal direc- 
tions by looking towards solution hyperplanes spanned by pairs of rows of A. We 
thus propose a modification to the randomized Kaczmarz method where each iter- 
ation performs an orthogonal projection onto a two-dimensional subspace spanned 
by a randomly-selected pair of rows. We point out that the idea of projecting in 
each iteration onto a subspace obtained from multiple rows rather than a single row 
has been previously investigated numerically, see e.g. [Sill]- 

With this as our goal, a single iteration of the modified algorithm will consist of the 
following steps. Let X\~ denote the current estimation in the /cth iteration. 

• Select two distinct rows a r and a s of the matrix A at random 

• Compute the translation parameter e 

• Perform an intermediate projection: y x^ + e(b r — (x^, a r ))a r 
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• Perform the final projection to update the estimation: x k+1 <— y + (b s — 
(y,a s ))a s 

In general, the optimal choice of e at each iteration of the two-step procedure cor- 
responds to subtracting from x k its orthogonal projection onto the solution space 
{x : (a r ,x) = b r and (a 8 ,x) = b s }, which motivates the name two-subspace Kacz- 
marz method. By optimal choice of e, we mean the value e opt minimizing the residual 
|| a? — ajfc+i|||. Expanded, this reads 



« 112- 



\\x - x k+1 \\l = \\e(b r - (x k ,a r ))(a r - (a s ,a r ) a 8 ) + x k - x + (b s - (x k ,a s ))a 

Using that the minimizer of ||7tu + is 7 = — , we see that 

- (a r - (a 8 , Or) a s , x k - x + (b s - (x k , a s ))a s ) 
° pt (b r ~ (x k ,a r ))\\a r - (a s ,a r )a s \\l 

Note that the unknown vector x appears in this expression only through its observ- 
able inner products, and so e opt is computable. After some algebra, one finds that 
the two-step procedure with this choice of Eopt can be re- written as follows [T5] . 

Algorithm 1.2: Two-subspace Kaczmarz 



Input: Matrix A, vector b 

Output: An estimation x k of the unique solution x to Ax = b 



Set Xo. { Trivial initial approximation } 

k <- 

repeat 

k <- k + 1 

Select r,s G {1,2, ... ,n} { Select two distinct rows of A uniformly at random } 
Set /ifc <— (a r , a s ) { Compute correlation } 

Set y k <— ajfc-i + (b s — (x k -i, a s ))a s { Perform intermediate projection } 
Set v k ar ~v kas { Compute vector orthogonal to a s in direction of a r } 

Set {3k < 6 j- 6a Mfc I Compute corresponding measurement } 

V i-l^fcl 



x k <- y k + (/3 fc - (y k , v k ))v k { Perform projection } 

Our main result shows that the two-subspace Kaczmarz algorithm provides the same 
exponential convergence rate as the standard method in general, and substantially 
improved convergence when the rows of A are coherent [15]. Figured] plots two iter- 
ations of the one-subspace random Kaczmarz and compares this to a single iteration 
of the two-subspace Kaczmarz algorithm. 



Theorem 1.1. Let A be a full-rank standardized matrix with n columns and m > n 
rows and suppose Ax = b. Let x k denote the estimation to the solution x in the 
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Figure 1 . For coherent systems, the one-subspace randomized Kacz- 
marz algorithm (a) converges more slowly than the two-subspace 
Kaczmarz algorithm (b). 

kth iteration of the two-subspace Kaczmarz method. Then 

E||a? - x k f 2 < ^1 - ||aj-x ||l, 

where D = min j - , A j , A and 5 are the coherence parameters fl 1 . 3 1) . and 
R = || A || A -1 1| 2 denotes the scaled condition number. 

Remarks. 1. When A = 1 or 5 = we recover the same convergence rate as 
provided for the standard Kaczmarz method ( 11. ip since the two-subspace method 
utilizes two projections per iteration. 

2. The bound presented in Theorem 11.11 is a pessimistic bound. Even when A = 1 
or 5 = 0, the two-subspace method improves on the standard method if any rows of 
A are highly correlated (but not equal) . This is evident in the proof of Theorem 11.11 
in Section [2] but we present this bound for simplicity. See also Section H] for more 
details on improved convergence bounds. 

Figure [2] shows the value of D of Theorem 11.11 for various values of A and 5. This 
demonstrates that in the best case (when 6 ~ A w 0.62), the convergence rate is 
improved by at least a factor of 0.1. 



1.2. Organization. The remainder of the report is organized as follows. In Sec- 
tion [2] we state and prove the main lemmas which serve as the proof of Theorem ll.il 
Section [3] discusses the two-subspace Kaczmarz method in the presence of noise and 
shows that in this case the method exhibits exponential convergence to an error 
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Figure 2. A plot of the improved convergence factor Dasa function 
of the coherence parameters 5 and A > 5. 

threshold. Section H] presents further modifications of the two-subspace Kaczmarz 
method which provide even more improvements on the provable convergence bounds. 
A discussion of these methods is provided in Section [61 We conclude with numerical 
experiments demonstrating the improvements from our method in Section 



2. Main Results 

We now present the proof of Theorem 11.11 We first derive a bound for the expected 
progress made in a single iteration. Since the two row indices are chosen indepen- 
dently at each iteration, we will be able to apply the bound recursively to obtain 
the desired overall expected convergence rate. 

Our first lemma shows that the expected estimation error in a single iteration of the 
two-subspace Kaczmarz method is decreased by a factor strictly less than that of 
the standard randomized method. 

Lemma 2.1. Let x k denote the estimation to the solution of Ax = b in the kth iter- 
ation of the two-subspace Kaczmarz method. Denote the rows of A by a\, a,2, ■ ■ ■ a m . 
Then we have the following bound, 

E\\x-x k \\ 2 2 < ( 1 - — J ||sr-ajfc_i||^ Y] C 2 rs Ux - x k _i, a r ) 2 + (x - x k _ x 

v ' r<s 

I I 2 

where C r>s = , ^ r,s , /j, r>s = (a r ,a s ), and R = \\A~ 1 \\ 2 \\A\\ 2 7 denotes the scaled 
condition number. 

Proof. We fix an iteration k and for convenience refer to Vj., /ifc, and as v, fi, and 
y, respectively. We will also denote 7 = (a r , v). 
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First, observe that by the definitions of v and x k we have 

x k = x k _ x + (x - x k _!, a s )a s + (x - x k _^ v)v. 
Since a s and v are orthonormal, this gives the estimate 

(2.1) \\x - x k \\ 2 2 = \\x - - \(x - x k _^a s )\ 2 - \{x - x k _^v)\ 2 

We wish to compare this error with the error from the standard randomized Kacz- 
marz method. Since we utilize two rows per iteration in the two-subspace Kaczmarz 
method, we compare its error with the error from two iterations of the standard 
method. Let z and z' be two subsequent estimates in the standard method follow- 
ing the estimate x k -i, and assume z ^ z' . That is, 

(2.2) z = cc fe _! + (b r - (x k _i,a r ))a r and z' = z + (b s - (z, a s ))a s . 

Recalling the definitions of v, \i and 7, we have 

(2.3) a r = na s + 71; with jj 2 + 7 2 = 1. 

Substituting this into (12.21) yields 

z = x k _! + fj,(x - x k _ x , a r )a s + 7(2; - a; fc _i, a r )v. 

Now substituting this into (12. 2p and taking the orthogonality of a s and v into 
account, 

z' = x k _! + (x - x k _!, a s )a s + j{x - x k _ 1: a r )v. 

For convenience, let e k -\ = x — x k _\ denote the error in the (k — l)st iteration of 
two-subspace Kaczmarz. Then we have 

\\ x ~ z '\\l = \\ e k-i - (efe_i, a s )a s - 7(e fc _i, a r )v\\\ 

= \\e k -i - (e fe _i, a s )a s - (e fe _i, v)v - (7(e fc _i, a r ) - (e fc _i, f)Hl2 

= ||e fc — x Hi - |(e fc _i, a s )| 2 - |(e fc _i,i;)| 2 + \^(e k -i,a r ) - (e fc _i,i;)| 2 . 

The third equality follows from the ortho normality of a s and v. We now expand 
the last term, 

|7(e fc _i, Or) - (e fc _i, v)\ 2 = |7(e fc _i, fia s + jv) - (e fc _i, f) | 2 

= \l 2 (e k -i,v) + 7//(e fe _i, a a ) - (e fc _i,i;)| 2 
= |/i 2 (e fc _i, w> - 7 / u(e A ._ 1 ,a s )| 2 . 

This gives 

||cc - z'\\ 2 2 = llefc.iH 2 - |(e fc _i,a s )| 2 - |(e fc _i, v)\ 2 + |/i 2 (e fc _!, «) - 7/i(e fc _i, a s ) | 2 . 

Combining this identity with (12.11) . we now relate the expected error in the two- 
subspace Kaczmarz algorithm, E||aj — x k \\\ to the expected error of the standard 
method, E|| follows: 

(2.4) E||as — aSfcUl = E||as - z'\\\ - E|/i 2 (e fe _i, v) - 7/i(e fc _i, a s }| 2 . 
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It thus remains to analyze the last term. Since we select the two rows r and s 
independently from the uniform distribution over pairs of distinct rows, the expected 
error is just the average of the error over all m 2 — m ordered choices r,s. To this 
end we introduce the notation ii r s = (a r , a s ). Then by definitions of v, fi and 7, 

E|/i 2 (e fc _i,i;) -7/i(e fc _!,a s )| 2 



1 


m 2 




m 




1 




m 2 




m 




1 





:((e fc _i, a r ) - // rjS (e fe _i, a s )) - ^ iS Wl - /j, 2 , (e fc _i, a s ) 



Mr, 



I — /I, 



A,s 



\ e fc — 1; Or) 



1-/1 



2 



+ /i r , S \/l - Mrs ( e fc-l, a s ) 



1 — /i 



2 



(e fc _i, a s ) 



We now recall that for any 9,7r,u, and v, 

(6u - Tivf + (9v - nu) 2 > (|tt| - |#|) 2 (w 2 + w 2 ). 

Setting = , r,s and 7r rjS = / AV,S , we have by rearranging terms in the 



~~2 ' 

symmetric sum, 



E|/i 2 (e fc _i, 0) - 7/x(e fc _i, a a )| : 



(2.5) 



m 2 




m 




1 




m 2 




m 




1 




m 2 




m 




1 




m 2 




m 



r<s 



r<s 



r<s 



(^=4^) 2 ((e k -u a r )) 2 + ((e^u a s }) 2 ) . 



Since selecting two rows without replacement (i.e. guaranteeing not to select the 
same row back to back) can only speed the convergence, we have from (11.11) that 
the error from the standard randomized Kaczmarz method satisfies 

E||sc - z'\\l < (1 - ^/Rf\\x - Xfc.xll 2 . 
Combining this with H2.4[) and (12. 5p yields the desired result. 
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Although the result of Lemma 12.11 is tighter, using the coherence parameters 5 and 
A of (jl.3p allows us to present the following looser but simpler result. 

Lemma 2.2. Let x k denote the estimation to Ax = b in the kth iteration of the 
two-subspace Kaczmarz method. Denote the rows of A by a\, a 2 , . . . a m . Then 

E||sc - x k \\\ < {^l - -J[j W x ~ x k-i\\l, 

where D = min | 5 , A 1 ^~ A ^ V 5 and A are the coherence parameters as in (11.31) . 
and R = || A _1 || 2 || A\\"p denotes the scaled condition number. 

Proof. By Lemma [2. II we have 
(2.6) 

IE||£C — £c fe 111 < I 1 - — I ||aj-x fc _i||^ = y^Cl s Ux-x k _ 1 ,a r ) 2 + {x-x k _ 1 ,a s ) 2 ) 

\ R I mr — m L — ' 

x 7 r<s 

where 

_ \(a r ,a 8 ) \ - (a r ,a s ) 2 
a/1 - (a r , a s ) 2 

By the assumption that 5 < \ (a r , a s ) \ < A, we have 

**-{^^}-» 

Thus we have that 

— ^Cr,s {( X ~ X h-1, «r) 2 + (X ~ X k -1, Cl s ) 2 ) 



m 2 — m 



r<s 



> —5— y~](( x ~ x k-i, a r) 2 + (x - x k _x, a s ) 2 ) 



rrr — m 

r<s 



D{m - 1) >A ; 

2_^{x - aj fc _i,a P ) 



(2.7) > 



m 2 — m 

r=l 

D \\x — x k . - 
m ||A~ 1112 



-1\\2 



12 

In the last inequality we have employed the fact that for any z, 



J2(z,a r )' 



> 



Combining f |2.Tj) and (12. 6p along with the definition of R yields the claim. 
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□ 



Applying Lemma l2~2l recursively and using the fact that the selection of rows in each 
iteration is independent yields our main result Theorem 11.11 



3. Noisy Systems 



Next we consider systems which have been perturbed by noise. The inconsistent 
system b = Ax now becomes (the possibly inconsistent system) b = Ax + w for 
some error vector w. As evident from (jl.2p . the standard method with noise exhibits 
exponential convergence down to an error threshold, which is proportional to H^Hoo. 
Our main result in the noisy case is that the two-subspace version again exhibits 
even faster exponential convergence, down to a threshold also proportional to HioH^. 

Theorem 3.1. Let A be a full rank matrix with m rows and suppose b = Ax + w 
is a noisy system of equations. Let x k denote the estimation to the solution x in 
the kth iteration of the two-subspace Kaczmarz method. Then 

■mil i| Jt. /"a 1 1 II 3 1 1 W | Im 

E||cc - X k \\ 2 < 7) 1 \\x- x \\ 2 



where r\ = (l — -^) 2 — ^, D = min j 5 , A /^f^ j , A and 5 are the coherence 
parameters (jl.3p . and R = ||A -1 || 2 ||A|||> denotes the scaled condition number. 



As in the case of our main result Theorem 11.11 this bound is not tight. The same 
improvements mentioned in the remarks about Theorem 11.11 can also be applied 
here. In particular, the dependence on A seems to be only an artifact of the proof 
(see Section Nonetheless, this result still shows that the two-subspace Kaczmarz 
method provides expected exponential convergence down to an error threshold which 
is analagous to that of the standard method. The convergence factors are again 
substantially better than the standard method for coherent systems. 



Proof of Theorem \3.1\ Fix an iteration k and denote by y, v, \i and the values of 
Uk, v k, an d 0k f° r convenience. Let y' and 0' be the values of y^, and 0k as if 
there were noise (i.e. w — 0). In other words, we have 

y = y + w r a r and = + 



V 1 - v- 

Then by the definition of x^, we have 

* . . W r + fiW s 

Xk = x k + w r a r H v, 
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where x k = y' + (/?' — (x^-i, denotes the next estimation from a?fc_i if there 
were no noise. Therefore, we have that 

\\X — x k\\2 S \\x — x k \\2 + \\w r a r H 



< \\X - X%\\ 2 + |Wr|||ar||2 + 



l-/i 2 

W r + fiW s 



x - x* k \\ 2 + \w r \ + 



W r + jJiW s 



\ v \\2 



1 - /i 2 



<n«-»;ih. 1 



Vi^A 2 

By Jensen's inequality and Lemma 12.11 

E||a; - x%\\ 2 < yffjE\\x - £c fe _i|| 2 . 



Combining the above recursively yields 

E||a5 - x k \\ 2 < r] k/2 \\x - x \\ 2 + 

< 77 fc/2 ||£C — £C ||2 + 

which proves the claim. 



3||io| 



fc-i 



/2 



vT^A 2 " ^ 

3 



w\ 



i-y/fj v/t^a 2 " 



□ 



4. Further improvements 

Next we state and prove a lemma which demonstrates even more improvements on 
the convergence rate from the standard method in the case where the correlations 
between the rows are non-negative. If this is not the case, we may alter one step 
of the two-subspace method to generalize the result to matrices with arbitrary cor- 
relations. This modification will decrease the factor yet again in the exponential 
convergence rate of the two-subspace method. We consider the noiseless case here, 
although results analagous to those in Section [3] can easily be obtained using the 
same methods. 

We first define an m 2 x n matrix f2 whose rows oj are differnces of the rows of A: 



(4.1) 



m(j-l)+i 



o. 



j,i = l,...,m, j^i, 
J = i 



We may now state our main lemma. 



12 



DEANNA NEEDELL AND RACHEL WARD 



Lemma 4.1. Let x k denote the estimation to Ax = b in the kth iteration of the 
two-subspace Kaczmarz method. For indices r and s, set /i r)S = (a r ,a s ). We have 
the following bound, 



E\\x-x k \\ 2 2 < ( 1 - — ) ||sc - x k -i\\l 5^ C r . tS ((x - x k _!,a r ) 2 + (x - x k 

v 7 r<s 

(4.2) 

j m 



m 2 — m 



where C r , s = and E m = A rf,y 

Remark. If the correlations ) between the rows of A are non-negative, 

then the constants Eij are all non-negative and thus this lemma offers a strict im- 
provement over Lemma [2~T1 However, if this is not the case, this bound may actually 
be worse than that of Lemma 12.11 To overcome this, we may simply modify the 
two-subspace Kaczmarz algorithm so that in each iteration yL r>s is non-negative (by 
possibly using — a r instead of a r when needed for example). This modification seems 
necessary only for the proof, and empirical results for the modified and unmodified 
methods remain the same. From this point on, we will assume this modification is 
in place. 



Proof of Lemma J^.l. We again fix an iteration k and for convenience refer to v k , [i k , 
and y k as v, \i, and y, respectively. We will also let e fe _x = x — x k _ x be the error 
in the (k — l)st iteration. 

By Q23D and (O]) . we have that 

(4.3) E||x - x k \\\ = (1 - l/i2)||ai - x k _ x \\\ + E|/i 2 (e fe _i, v) - 7/i(e fc _i, a s )\ 2 . 



We now analyze the last term carefully. To take expectation we must look over all 
combinations of choices r,s, so to that end denote (a r ,a s ) by fi rs . Since we select 
two rows uniformly at random (with replacement), using the definitions of c, \x and 
7, we have 



13 



E\fi 2 (e k -i, c) - 7^(e fe _i, a s 



— E 
— E 



— m 



A*r,a 



:((e fc _i,o r ) - fi r<s (e k -i, a s )) - /u r ,a\/l - M 2 s (e fc „i, a s ) 



1-M 



(ejfe-i, o r ) 



Mr,, 



1-M 



2 



+ Mr,s\/l - Mrs ( e fc-l7 «s) 



^ 1 ^9 f^r.a (ejfc-i, o r ) - (e fc _i, a s )) 

— m ^ 1 — u 2 V / 



Now observe that 

{^r tS (e k -i,a r ) - (e fc _i,a s )) 2 + (/i r>s (e fe _i, a s ) - (e fc _i,a r )) 2 
= (1 - iJL r>s ) 2 {{e k -i,a r ) 2 + (e fc _i,a«) 2 J + 2// fv ,((e fc _i,a P - a s ) 

= (1 - /i rjS ) 2 ((e fc _i, a r } 2 + (e fc _i, a s ) 2 ) + 4/v, s (l - yU 2 J f (e fc _i, ar _ ) 

N / Chip CL g 

Thus taking advantage of the symmetry in the sum we have, 



E|/i 2 (e fc _i, c) - 7/i(e fc _i,a s )| 2 
(4.4) 



(e fe _i,o r ) +(e fc _i,a s ) + 4/i rs I (e fc _i, -) ) 



Combining with (14. 3ft yields the claim. 

□ 

We may now use the coherence parameters S and A from (II. 3p to obtain the following 
simplified result. 

Lemma 4.2. Let Xk denote the estimation to Ax = b in the kth iteration of the 
two-subspace Kaczmarz method. Then, 

1.2 ( ( 1 \ 2 D E\ .. Il2 
E \\x — xu L < 1 a; — ccfe_i L, 

11 112 - I V W R Q 
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where D = min l^fi-, ^Tf^}> E = 463 and R = II fll and Q = 
|| 2 ||ri||^ denote the scaled condition numbers of A and Q (from ( 14. ip . respec- 
tively. 

Proof. In light of Lemma 14.11 and the proof of Lemma \2.2\ it suffices to show that 
1 m E 

— Eij(x - Xk-x^mrj^+i) 2 > — \\x - Xk-i\\l. 

m z — m *— ' (J 

j>=i 

By the definition (11.31) of 5 and ||f2 _1 ||, we have 

A5 3 



1 \ 4(5 J \ 

fib lib lib 

A5 3 \\x-x k -.i\\l 



> 

m 2 — m \\Q 1 1| 

E 
> —. 

~ Q 

The last equality follows since the rows of Q are unit norm and equal to zero for 
i = j. 

□ 

Applying Lemma 14.21 recursively yields our main theorem. 

Theorem 4.3. Let x^ denote the estimation to Ax = b in the kth iteration of the 
two -sub space Kaczmarz method. Then, 

, 1\ 2 D E\ k „ ll2 
E aj-asfcHa < I 1- — - — - — ||x-aso | 2 , 



R) R Q 



where D = min j , } , E = 45 3 and R = || A' 1 \\ 2 \\ A\\% and Q = 

\\Q~ 1 \\ 2 \\Q\\ 2 F denote the scaled condition numbers of A and ft (from (14. ip . respec- 



tively. 



5. Numerical Results 



Next we perform several experiments to compare the convergence rate of the two- 
subspace randomized Kaczmarz with that of the standard randomized Kaczmarz 
method. As discussed, both methods exhibit exponential convergence in expec- 
tation, but in many regimes the constant factor in the exponential bound of the 
two-subspace method is much smaller, yielding much faster convergence. 
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To test these methods, we construct various types of 500 x 50 matrices A. To 
get a range of 5 and A, we set the entries of A to be independent indentically 
distributed uniform random variables on some interval [c, 1]. Changing the value 
of c will appropriately change the values of 6 and A. Note that there is nothing 
special about this interval, other intervals (both negative and positive or both) 
of varying widths yield the same results. For each matrix construction, both the 
randomized Kaczmarz and two-subspace randomized methods are run with the same 
initial (randomly selected) estimate. The estimation errors are computed at each 
iteration. Since each iteration of the two-subspace method utilizes two rows of the 
matrix A, we call a single iteration of the standard method two iterations in the 
Algorithm 11.11 for fair comparison. 

Figure [3] demonstrates the regime where the two-subspace method offers the most 
improvement over the standard method. Here the matrix A has highly coherent 
rows, with 5 ~ A « 1. 




100 200 300 400 
Iterations 



Figure 3. A log-linear plot of the error per iteration for the ran- 
domized Kaczmarz (RK) and two-subspace RK (2SRK). Matrix A 
has uniformly distributed highly coherent rows with 5 = 0.992 and 
A = 0.998. 



Our result Theorem 1 1 . 1 1 suggest s that as 5 becomes smaller the two-subspace method 
should offer less and less improvements over the standard method. When 5 = the 
convergence rate bound of Theorem 1 1.1 1 is precisely the same as that of the standard 
method (11.11) . Indeed, we see this precise behavior as is depicted in Figure SJ 

Next we performed experiments on noisy systems. We used the same dimensions 
and construction of the matrix A as well as the signal type. Then we added i.i.d. 
Gaussian noise with norm 0.1 to the measurements b. Figure [5] demonstrates the 
exponential convergence of the methods in the presence of noise for various values 
of 5 and A. 
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50 100 150 200 250 50 100 150 200 

(c) Iterations (fl) Iterations 



Figure 4. A log-linear plot of the error per iteration for the ran- 
domized Kaczmarz (RK) and two-subspace RK (2SRK). Matrix A 
has uniformly distributed highly coherent rows with (a) 5 = 0.837 
and A = 0.967, (b) 5 = 0.534 and A = 0.904, (c) 5 = 0.018 and 
A = 0.819, and (d) 5 = and A = 0.610. 

Plots (a), (b), and (c) demonstrate exponential convergence to the error threshold 
(or below) with improvements over the standard method. Plot (d) shows the semi- 
convergence effect of the two-subspace Kaczmarz method. It is an open problem to 
determine at which point to terminate the method for optimal error without knowl- 
edge of the solution x. One option is to simply to terminate after 0(n 2 ) iterations, 
as this is the amount of iterations needed for convergence (see the discussion in [20] ) . 
This is of course not optimal, and as Figure [6] shows, using this halting criterion 
may cause the two-subspace method to perform worse than the standard Kaczmarz 
method. We leave overcoming this challenge for future work. 

6. Discussion 

As is evident from Theorems 11.11 and 14. 3[ the two-subspace Kaczmarz method pro- 
vides exponential convergence in expectation to the solution of Ax = b. The con- 
stant in the rate of convergence for the two-subspace Kaczmarz method is at most 
equal to that of the best known results for the randomized Kaczmarz method (11. ip . 
When the matrix A has many correlated rows, the constant is significantly lower 
than that of the standard method, yielding substantially faster convergence. This 




500 1000 1500 2000 2500 500 1000 1500 2000 2500 

Iterations (h) Iterations 




Figure 5. A log-linear plot of the error per iteration for the ran- 
domized Kaczmarz (RK) and two-subspace RK (2SRK). Matrix A 
has uniformly distributed highly coherent rows with (a) 5 = 0.651 
and A = 0.932, (b) 5 = 0.933 and A = 0.985, (c) 5 = 0.964 and 
A = 0.992, and (d) 5 = 0.965 and A = 0.991. 




Figure 6. A plot of the error after 500 iterations for the random- 
ized Kaczmarz (RK) and two-subspace RK (2SRK). Matrix A has 
uniformly distributed highly coherent rows with (a) 5 = 0.981 and 
A = 0.995, and (b) 5 = 0.993 and A = 0.998. 
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has positive implications for many applications such as nonuniform sampling in 
Fourier analysis, as discussed in Section [TJ 

We emphasize that the bounds presented in our main theorems are weaker than 
what we actually prove, and that even when 6 is small, if the rows of A still have 
many correlations, Lemmas 12.11 and 14.11 still guarantee improved convergence. For 
example, if the matrix A has correlated rows but contains a pair of identical rows and 
a pair of orthogonal rows, it will of course be that 5 = and A = 1. However, we see 
from the proofs of our main theorems that the two-subspace method still guarantees 
substantial improvement over the standard method. Numerical experiments in cases 
like this produce results identical to those in Section 

It is clear both from the numerical experiments and Theorem 11.11 that the two- 
subspace Kaczmarz performs best when the correlations (a r , a s ) are bounded away 
from zero. In particular, the larger 5 is the faster the convergence of the two-subspace 
method. The dependence on A, however, is not as straightforward. Theorems 11.11 
and 14.31 suggest that when A is very close to 1 the two-subspace method should 
provide similar convergence to the standard method. However, in the experiments 
of Section [5] we see this is not the case. This dependence on A appears to be only 
an artifact of the proof. 

6.1. Noisy systems. As is the case for many iterative algorithms, the presence 
of noise introduces complications both theoretically and empirically. Theorem 13.11 
guarantees expected exponential convergence to the noise threshold. For pessimistic 
values of D, the noise threshold provided by Theorem 13.11 is greater than that of 
the standard method, (jl.2p . by a factor of \/R. In addition, large values of A 
produce large error thresholds in this bound. As in the noiseless case, we believe 
this dependence on A to be an artifact of the proof. 

A further and important complication that noise introduces is a semi-convergence 
effect, a well-known effect in Algebraic Reconstruction Technique (ART) methods 
(see e.g. [S]). For example, in Figure E](d), the estimation error for the two-subspace 
method decreases to a point and then begins to increase. It remains an open problem 
to determine an optimal stopping condition without knowledge of the solution x. 

6.2. Future Work. The issue of detecting semiconvergence is a very deep prob- 
lem. The simple solution would be to terminate the algorithm once the residual 
IIAajfc — &H2 decreases below some threshold. However, the residual decreases in 
each iteration even when the estimation error begins to increase. Determining the 
residual threshold beyond which one should terminate is not an easy problem and 
work in this continues to be done. 

We also hope to improve the error threshold bound of Theorem 13.11 for the two- 
subspace method. We conjecture that the (1 — A) term can be removed or im- 
proved, and that the dependence on R can be reduced to y/R in the error term of 
Theorem 13.11 
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Finally, a natural extension to our method would be to use more than two rows 
in each iteration. Indeed, extensions of the two-subspace algorithm to arbitrary 
subspaces can be analyzed [H]. 
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